Non-negative Matrix Factorization: A possible way to learn sound dictionaries
نویسندگان
چکیده
Auditory system detects sound signals and uses the temporal-frequency information of the sound signals to conduct sound identification, sound localization, and sound source separation. Thanks to the past studies, we know that the hair cells at cochlea show frequency-dependent responses against input sound signals. But, little is known about how the sound information is conducted to and processed within the auditory cortex. Recently, B.A.Pearlmutter and A.M.Zador proposed an algorithm for monaural source separation under the assumption that the precise head-related transfer function (HRTF) and all the sound dictionary elements are known (ref.[1]). To apply this algorithm to the real sound signals, here I propose that non-negative matrix factorization (NMF) applied to the spectrograms of sound signals would successfully give a set of sound dictionary elements. When NMF was applied to solo-music signals with an appropriate value for the rank of factorization, it could extract instrument-specific patterns of basis spectrograms, each of which has a peak frequency for different notes. Interestingly, the sound signals converted back from the obtained basis spectrograms sounded more or less like the corresponding instrument, which suggests that the obtained basis spectrograms, or basis sound elements, would be a good candidate for the sound dictionary. When NMF was applied to sound signals played with several different instruments, the obtained basis sound elements can be categorized into each instrument-specific pattern by hand. In addition, using the categorized elements, sound signals can be reconstructed corresponding to each instrument part of the original music. The fact that source separation can be done using the basis sound elements obtained by NMF suggests that NMF would be a possible way to learn sound dictionaries from sound signals.
منابع مشابه
Voice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملNon-negative Hidden Markov Modeling of Audio with Application to Source Separation
In recent years, there has been a great deal of work in modeling audio using non-negative matrix factorization and its probabilistic counterparts as they yield rich models that are very useful for source separation and automatic music transcription. Given a sound source, these algorithms learn a dictionary of spectral vectors to best explain it. This dictionary is however learned in a manner th...
متن کاملDirection of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization
Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound...
متن کاملA new approach for building recommender system using non negative matrix factorization method
Nonnegative Matrix Factorization is a new approach to reduce data dimensions. In this method, by applying the nonnegativity of the matrix data, the matrix is decomposed into components that are more interrelated and divide the data into sections where the data in these sections have a specific relationship. In this paper, we use the nonnegative matrix factorization to decompose the user ratin...
متن کاملIterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition
Non-negative Matrix Factorization (NMF) is a part-based image representation method. It comes from the intuitive idea that entire face image can be constructed by combining several parts. In this paper, we propose a framework for face recognition by finding localized, part-based representations, denoted “Iterative weighted non-smooth non-negative matrix factorization” (IWNS-NMF). A new cost fun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005